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Abstract 

We derive herein the limiting laws for certain stationary distributions of birth-and-death 
processes related to the classical model of chemical adsorption-desorption reactions due to 
Langmuir. The model has been recently considered in the context of a hybridization reaction 
on an oligonucleotide DNA microarray. Our results imply that the truncated gamma- and 
beta- type distributions can be used as approximations to the observed distributions of the 
fluorescence readings of the oligo-probes on a microarray. These findings might be useful in 
developing new model-based, probe-specific methods of extracting target concentrations from 
array fluorescence readings. 
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1 Introduction 



High density oligonucleotide microarrays are a widely used modern bio-technology tool enabling the 
simultaneous testing for the presence as well as quantification of large numbers of genes in prepared 
target RNA samples. For a general introduction to this technology we refer to the celebrated 



paper [14j or to [13( for a more recent overview. Among several competing types of oligonucleotide 
microarrays, the Affymetrix GeneChip design appears to be currently one of the most common. 
GeneChip arrays consist of a substrate onto which short single strand DNA oligonucleotide probes 
have been synthesized using a photolithographic process. A chip surface is divided into some 
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hundreds of thousands of regions typically tens of microns in size, with the DNA probes within each 
region being synthesized to a specific nucleotide sequence. The target RNA sample is hybridized 
onto the chip to form probe-target duplexes, and the chip is scanned to obtain fluorescence intensity 
readings from dyes incorporated during the laboratory procedures. In principle, with suitable 
calibration, intensity readings are intended as a 'proxy' measure of the concentration of matching 
target RNA in the sample. However, due to optical noise, nonspecific hybridization, probe-specific 
effects, and measurement error, the empirical measures of expression (i.e., the scanner- measured 
fluorescences) that summarize probe intensities can often lead to imprecise and inaccurate results 
(see, e.g., Wu and Irizarry 



It seems that some potentially significant improvement in relating the scanner readings of the probe 
intensities to the target genes concentrations could be obtained by using a model-based approach 
accounting for the physical processes driving hybridization. Recently, some authors have begun to 
address this issues by appealing to the dynamic adsorption models well known in physical chemistry 
literature (see, Held et al. 0] or Burden et al. [lj). Such models stemming from the physics of the 
chemical reactions involved are especially valuable as they could also help us in better understanding 
of the physical processes driving hybridization and lead to improvements in both microarrays design 
and performance. 

One of the most popular adsorption models considered in the context of microarrays (cf. e.g., 
Hekstra et al. [5J or Burden et al. [H) is the so-called Langmuir model (see next section) which in 
its simplest deterministic form describes the relationship between concentration and fluorescence 
levels of probe-target complexes by means of a hyperbolic function. In the context of microarrays 
(in particular, GeneChips) in order to properly account for the effects of multiple simultaneous 
hybridizations as well as the cross-hybridization due to competition between similarly sequenced 
targets for the same probe regions, its seems that the stochastic version of the Langmuir model is 
needed. The analysis of such a model was carried out recently for instance in Burden et al. QJ or 
earlier in Newton et al. 12, 1(| by means of adopting the general results of Dennis and Patil [2| on 



the fluctuations of the stochastic diffusion equations around their stable equilibrium points. 

The model for the stochastic fluctuations of the equation described by Dennis and Patil was cast 
as a boundary-free problem and intended to provide a continuous diffusion-type approximation to 
the behavior of large biological systems as typically encountered in population dynamics problems. 
With no natural boundary restrictions it was argued in Dennis and Patil [2| that the fluctuations 
around stable equilibria are approximately distributed as a gamma random variable. Based on this 
argument the gamma model for gene expressions was since adopted by several authors in the context 
of analyzing microarray data (cf. e.g., Newton et al. [l2j], Newton and Kendziorski [ll|, Burden 
et al. Q). 

The simple extension of the Dennis and Patil results to microarrays setting, albeit appealing, seems 
to require further justification since the microarray hybridization models are neither continuous nor 
boundary-free. Whereas the continuous approximation to the large discrete system seems easily 
justifiable, it is not entirely clear what discrete system is being approximated by the boundary- free 
diffusion model (see (TSJ) below). 
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The purpose of the current paper is to formally derive some simple closed-form stochastic laws 
approximating the equilibrium distributions of the discrete stochastic hybridization reactions under 
the explicit assumptions on the random noise terms which are consistent with the stochastic Lang- 
muir model but, unlike the latter, are not boundary-free. The idea for the derivation is very simple. 
We start by noticing that the reaction rate equation of the deterministic Langmuir model may be 
considered as the usual approximation to the set of two coupled stochastic chemical reactions on 
the finite state space (i.e., probe region size for GeneChips). We then add the stochastic forcing of 
the Langmuir equation as an additional term to the original birth and death rates of this discrete 
chemical system. It turns out that the analysis of the equilibrium distribution of this adjusted birth 
and death process (we refer to it below as the Langmuir BD process) when the number of states is 
large leads to the stochastic laws which are, under most circumstances, consistent both with Dennis 
and Patil 0] approach as well as Burden et al. [l| results. However, due to the fact that we had 
based our analysis on a finite discrete system, unlike previous approaches, ours gives more insight 
into the boundary behavior of the underlying discrete stationary process and its approximations. 
In particular, when considering the discrete system it become obvious that an adjustment for the 
saturation effect is needed in the form of a Dirac-delta probability distribution at the boundary 
of the state space. This leads to an interesting consequence that the limiting stochastic law is 
not absolutely continuous as in Dennis and Patil result but rather has an atom at the state space 
boundary. We give some formal details of these findings in Section 3 below. 

Beyond the current introductory section the paper is organized as follows. In the next section (Sec- 
tion 2) we offer a brief overview of some of the results related to the classical Langmuir adsorption- 
desorption model in our context. Our main theorem on the limiting stochastic law for the stationary 
distributions of the Langmuir birth-and-death process in large state space along with some discus- 
sion is presented in Section 3 along with an outline of the proof. We conclude with some final 
remarks in Section 4. 



2 The Langmuir Model 

In 1916 Irving Langmuir devised a simple model involving a thermodynamic equilibrium to predict 
the fraction of solid surface covered by an adsorbate as a function of its gas pressure Q . The model 
was later extended to liquid systems, where the equilibrium involved concentrations in solution. In 
the Langmuir model adsorbate and solvent molecules compete to adsorb on sites on the surface of 
the powder and each site must be occupied by either a solvent molecule or an adsorbate molecule. 
For the hybridization reaction in oligonucleotide DNA-microarrays the same principle is applied in 
order to represent competing adsorption and desorption of RNA molecules to form probe-target 
complexes at the chip surface (see, e.g., Forman et al. [3]). 

Let u = u(t) G (0, 1) be the fraction of sites within a probe region occupied by probe-target com- 
plexes at time t after the commencement of hybridization, and di and d,2 be the forward adsorption 
and backward desorption rate constants respectively. The forward adsorption reaction is assumed 
to occur at a rate d\x{l — u), proportional to the RNA-target concentration x and fraction (1 — u) 
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of unoccupied probe sites. The backward reaction (desorption) is assumed to occur at a rate d 2 u, 
proportional to the fraction of occupied probe sites. In deterministic setting, the fraction of probe 
sites occupied by probe-target complexes is then given by the reaction rate equation known as the 
Langmuir equation 

— = d\X — {d\X + d 2 )u. (1) 
dt 

The corresponding equation incorporating the stochastic noise associated with both target and non- 
target specific hybridization is given by the following stochastic version of ([T]) herein refered to as 
the stochastic Langmuir equation. It has the form 

= dix - (dix + d 2 )u + a/ gju) Z t , (2) 

where g(u) > is a known function of u and Z t is a Gaussian white noise process with unit variance. 
The model described by the above equation is known in the literature as the stochastic Langmuir 
model and is a special case of a diffusion model considered e.g., by Dennis and Patil ^ in their study 
of stochastic fluctuations of populations about their stable equilibria. We note that in the present 
context of the Langmuir adsorption-desorption model, the solution of the stochastic equation (j2J) is 
no longer bounded and thus ([2]) suffers an obvious drawback in the fact that the function u has no 
physical interpretation for u > 1. 

We also note that (J2J) is concerned with a single DNA (oligo) probe only, with the effect of other 
probes replaced by a random noise term (stochastic forcing). In the context of modeling a reaction 
network of simultaneous hybridization reactions on a DNA-microarray (j2J) is therefore one of the 
simplest examples of a complex system decoupling and stochastic excitation (see, e.g., Majda et al. 
[sj], Katsoulakis et al. 0]). 

In general, under mild regularity conditions on g, one can argue that the stationary solution of ([2]) 
is approximately distributed as a gamma random variable around the deterministic system steady 
state 0]. However, as noted by Burden et al. [l[ for a linear choice of g, namely 

g(u) = Cd\xu (3) 

with C > the equation ([2]) has an exact stationary gamma solution. That fact may be easily 
inferred from the corresponding Fokker-Planck equation (see, e.g., the monographs by van Kampen 



15] or Ethier and Kurtz [3j and the references therein) which written in terms of the density of u, 
say ip(u,t), is given b)Q 

dibiu.t) d f r/ , , s C dix d\uib(u, t)] 1 

J [{d lX + d 2 )u - d lX }tfj{u,t) + — — 1 a J r- ( 4 ) 



dt du " 2 du 



1 Herein we are primarily concerned with modeling the internal fluctuations of the stochastic system modeled 
by §2$. Accordingly, we interpret the stochastic equation @ in the sense of Ito calculus. For the discussion of an 
alternative Fokker Planck equation (TJ]) using the Stratonovich calculus, see e.g., Dennis and Patil or, for more 



detail, van Kampen 15j. 
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Solving for the steady state density, say ipo(u), gives 



Mu) oc u 2tc ~ x exp f -2 dl *j£ 2 U ) forM e[0,oo) (5) 



and zero otherwise. 



From the above considerations we see that adopting the stochastic model (j2J) with no additional 
assumptions may result in a stationary solution ipo(u) being an absolutely continuous distribution 
with positive support on (0, oo). This finding is, however, not consistent with the experimental data 
which suggests that, at least for some values of the parameters, the saturated state u = 1 should 
have positive probability. We also note an apparent lack of physical interpretation for the values 
it > 1 in the context of the original Langmuir model. 

An alternative approach to modeling the dynamics of hybridization (or absorption-desorption) reac- 
tion is to analyze directly an underlying discrete stochastic system which (T5]) intends to approximate. 
We note that in our setting we have a simple one dimensional BD process described by one chemical 
species Cmpx i.e., the amount of probe-target complex or, in other words, the number of occupied 
nucleotides in the probe region. Hence, we consider a system of two coupled chemical reactions 

— * Cmpx 
Cmpx (6) 

where b(-) and d(-) are system-state dependent birth and death rates, respectively. In order to 
describe our approach we need to specify the form of these rate functions. To this end we shall 
define a discrete, finite-state version of the stochastic Langmuir adsorption-desorption. 
Definition 2.1 (LBD Process). Langmuir BD process is any BD process with the set of states 
{0, . . . , N} and the birth and death rates of the form 

b(k)= ci(N -k)+C(k,N) 
d(k)= c 2 k + C(k,N) 

for k — 0, . . . , N. Here c\, c 2 are some positive real constants and the function C(-, N) is assumed 
to be of the form 

C(k,N) = ^-g(k/N) (7) 

for < k < N and to satisfy the boundary conditions ensuring the finiteness of the system space, 
i.e. C(0,N) = C(N,N) = 0. 

In the LBD process the terms C\(N — k) and c 2 k are linear rates of birth and death as suggested 
by the deterministic Langmuir model ([1]). The additional term C(-,N) introduced into b(k) and 
d(k) is intended to model the noise of the non-target adsorption and desorption. For instance on a 
GineChip C(-,N) accounts for the competition for the same RNA targets between different probe 
regions with similar nucleotide sequences. Assumption (I7j) implies the "density-dependent" form 
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for the rates b(-),d(-) (see, e.g., [3( chapter 11) with the noise term C(-,N) being of higher order 
than the terms ci(N — k) and c 2 k. 

Note that the (infinite) BD process with the boundary-free rates (i.e., d(k) and b(k) given as in 
Definition 12.11 but without requiring that C(0,N) = C(N,N) = 0) may be approximated by the 
solution of the Langmuir stochastic equation for large N. This may be informally argued as 
follows. Let k t be the state of the system (Q at t > 0, described as a difference of two independent 
unit Poisson processes, say Yx,Y-i, with random time changes (see, e.g., chapter 6) 

k t = k + Y x (j b{k s )ds^j - y_! (J d{k s )ds 

Since for any unit Poisson process Y and large N we have N~ 1 ' 2 (Y(Nv ) — Nv) ~ W(v) for any 
real v with W(v) being the standard Brownian motion (SBM), the Poisson processes Yi,F_i may 
be approximated for large N by independent SBM processes, say Wi, W_\. Denoting u(t) = k t /N 
this diffusion approximation of (jHJ) is 



u{t) =N- 1 k + N-^Wt (J [c x (l - u(s)) + ^g(u(s))]ds 



- N-^W-! U\ c Ms) + ^g(u(s))}ds^ + j\ Cl (l 



u(s)) + c 2 u(s)]ds. 



which is distributionally equivalent to 

u(t)= [ [ci(l - u(s)) + c 2 u{s)}ds + [ y/g(u(s))dW{s) + o P (l) (9) 
Jo Jo 

and hence in the limit to the integral version of ([2]) (see Ethier and Kurtz [3] chapter 11 for 
details). 

Of course, depending on the form of C(-,N) we shall have different forms of the LBD process. In 
order to cast our results somewhat parallel to the model ([2]) under a linear form of g in Q, herein 
we consider only C(k,N) given by the functions C^C^C^ defined below, with the corresponding 
models henceforth referred to as Mi,M 2 , M3, respectively. 

C x {k, N) = c 3 Nk for < k < N and d(N, N) = (Mi) 

C 2 (k, N) = c 3 N(N -k) for < k < N and C 2 (0, N) = (M 2 ) 
C 3 (k, N) = c 3 k(N -k) for < k < N (M 3 ) 

Note that if we disregard the boundary condition Ci(N,N) = then the model M\ is a discrete 
analogue of (J2J) with the choice of g given by in the sense that BD process © (or (JH])) may 
be approximated by (Q for large N, leading to the Fokker-Planck equation given in (jlj) and, 
consequently, to (jSJ) with 

Ci = d\X 

c 2 = d 2 (10) 
C3 = Cd\x/2. 
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This casting of the equation (j2J) as an approximation to M\ gives also additional insight into the 
somewhat mysterious choice of the form of function g in ([3]). Considering the rates in Mi it becomes 
clear that the choice of ([3]) is a reflection of two implicit assumptions concerning the microarray 
hybridization reactions, namely that (i) the level of the target-specific signal in the probe region 
has lower magnitude than the level of non-specific signal (i.e., signal noise) and (ii) the non-specific 
signal noise is proportional to the total system (i.e., probe region) size as well and the current 
system state and the target concentration. 

Note that the model Mi is simply a 'reflection' of M\ obtained by considering the amount of 
unoccupied probe region N—Cmpx instead of the amount oiCmpx. Model M2 is thus not concerned 
with saturation but rather with threshold effect of the probe adsorption. This phenomena occurs 
when the LBD process attains an empty state with positive probability. 

Note also that both Mi and M 2 rate functions have discontinuities at the boundary. This is in 
contrast with the model M 3 which enjoys smooth boundary conditions with no discontinuities. In 
general such discontinuities in rate functions for BD processes prevent the direct application of an 
approximation of the form ([§]), however it turns out that for an LBD processes M\ — M 3 their 
stationary distributions may be approximated more directly. 

3 Limit Theorem 

In this section we state and prove the main result of the paper, namely the limit theorem for the 
stationary distributions of the LBD processes under the models M\ — M 3 . The proof we give herein 
is quite elementary and is based on the fact that for one dimensional birth-and-death processes 
with bounded state space and polynomial rates, the moments of their limiting distributions must 
be uniquely determined by the corresponding detailed balance (reversibility) conditions. At this 
point it is perhaps also worth noticing that even though herein we have restrict ourselves only to the 
models M\ — M3 as the most relevant for the type of asymptotic behavior described by (j2J) and ([3]), 
it is not difficult to see that the method of the proof allows one to easily extend the result to any 
LBD processes with polynomial-type birth and death rates. This, at least in principle, allows then 
to obtain limit theorems for the discrete versions of ([2]) with any function g continuous on (0,1) and 
continuously extendable to [0,1] where it may be always uniformly approximated by polynomials. 
However, such considerations go beyond our current scope. 

In order to state the theorem we shall need some additional notation. For z, 7 > denote the incom- 
plete gamma function by T(z, 7) = J Q 7 s z_1 exp(— s) ds and for any a, (3 > let IG(a, (3, 1) denote an 
incomplete gamma random variable with the density function f a ,p{%) = r(a, /3)~ l [3 a exp(— xf3) 
for x G (0, 1) and zero otherwise. Let us denote by F a $ the distribution function of IG(a,/3, 1). 
We introduce the following definition. 

Definition 3.1 (LIG Distribution). We say that the random variable has the Langmuir-incomplete 
gamma (LIG) distribution with parameters a, (3 satisfying (3 > a > if its distribution function is 
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given by the mixture 

G — (1 — -K a ^)F ay p + TT a)l 3 5i 

where F a $ is the distribution function for IG(a, /3, 1) random variable, S\ is the distribution function 
of a degenerate random variable with mass concentrated at one and 

Za ' p = (3 a + T(a,p) (P-a)exp(P)' 



Below we denote the Langmuir incomplete gamma distribution with parameters a, (5 by LIG(a, f3). 
We shall also denote by Beta(a,f3) the usual beta distribution with parameters a, (3 > and the 
density h(x) = T(a + /5)r(a)- 1 r(/3)" 1 £ x a 1 (1 — x) 13 1 dx for x G (0, 1) and zero otherwise. We 
have the following 

Theorem 1 (Limit Theorem for a Stationary Distribution of an LBD Process). Let X$ be the 

stationary distributions of LBD Process Mi for i = 1,2,3, and let a = ci/c% and b = (ci + c-i)lc?>, 
as well as Y$ = X^/N. Then, as N — >• oo we have the weak convergence 

> Zi % = 1, 2, 3 

where the limiting random variables Zi are as follows 

(i) Z\ is LIG(a, b), 

(ii) Zi is such that 1 — Z 2 is LlG(b — a, b), 
(Hi) Z% is Beta(a, b — a). 

Before discussing the proof of this result, some remarks are perhaps in order. 

• As it shall become clear from the proof, it turns out that all any LBD processe (hence, 
also Mi — M3 ) have the correct Langmuir mean given by the stationary solution of the 
deterministic equation ([1]) with the c\ and c 2 constants as in (jTUI) . In that sense an LBD 
process may be viewed as a discrete analogy of continuous models and fl2]) with specific 
functions g in the latter related to LBD via (J7J). One should stress, however, the fundamental 
difference between the approach to approximating a stochastic equilibrium of a discrete system 

offered by Theorem [1] and that based on analyzing the equilibrium distribution of the 
diffusion approximation (j2J) outlined in (jSJ) and (Q. In Theorem [1] one considers a sequence 
of stationary distributions of LBD processes indexed by the size of the state space iV and 
derives its limit as iV increases. The approximation via (j2j) is based on approximating the 
entire discrete process (not just its equilibrium distribution) for large iV and then deriving a 
stationary distribution of the approximation. 

• Despite the very different models behind them, if w then (i) of Theoem 1 specializes 
to the result on the stationary density ([SD obtained via (j3J). Thus when 7r 0j j m our theorem 
fomally justifies the use of gamma approximation for modeling hybridization reactions under 
the boundary- free model described by (T5]) and the assumption (j3J). 
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• If the condition 7r 0) 6 ~ is not satisfied then there may be a significant difference between 
the stationary distribution obtained from the boundary-free type analysis via the stochastic 
equation (j2J) and the LBD process analysis. This is due to the fact that the LBD analysis 
takes properly into account the discontinuities in the rate functions whereas the continuous 
model (j2J) does not. 

• The theorem indicates that both Mi and M 2 models which incorporate the linear noise term 
C(-, N) into their rate functions are amenable to the gamma- type approximation of their sta- 
tionary distributions perhaps after some adjustment for the bounded state-space. In contrast, 
the model M3 with the quadratic and 'boundary symmetric' noise term yields a different type 
of stationary distribution (i.e, beta) with no boundary effects. 

It seems that there are several ways of arriving at the result of the theorem. Herein we have chosen 
the method of the proof which is perhaps slightly convoluted but on the other hand almost com- 
pletely elementary and thus fully accessible to readers without extensive background in stochastic 
processes theory. 

In order to provide a proof of Theorem 1 we shall need two auxiliary results stated below as 
Lemmas 1 and 2. The first one of them concerns some elementary properties of the moments of a 
LIG distribution. 

Lemma 1. Let Z be a random variable distributed according to LIG(a, (3) . Then for any integer 
m > we have 

EZ m+1 = ^±^EZ™ - (12) 



Proof of LemmaUl Let W be a random variable distributed according to IG(a,/3, 1). Elementary 
calculation based on the integration by parts shows that for any integer m > 



EW 



m+l 



m + a 



EW r 



(3 a ~ l exp(-/3) 



(3 " r(a,/3) 
In view of the above and by the definition of Z we have for any integer m > 

EZ m+l = ^ _ ^ EW m+l 



(1 - Kafi) 

m + a 



7T, 



m + a 



EW 



P*- 1 exp(-/3) 



V{a, (3) 



P 

m + a 



[(1 
[(1 



Hi Zi 7T„, ft . 

(3 (3 



7T Qi/3 ) EW m + 7T Qj/3 ] + 

EW m 



m 



a 







K a ,{3 - (1 - -Ka^) 



7T, 



m (3 — a 



1-7T, 



(3 a - x exp(-P) 

V(a,f3) 
, I3 a - X exp(-P) 



□ 
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Our second lemma is as follows. 

Lemma 2. Let a, (3 > be arbitrary. Consider N — ► oo. For any non-increasing real sequence 
&n I Qf > satisfying (a/v — a ) logA" — ► and an?/ rea/ sequence fa — > [3 > we nave 

AT— g r( ^ + fc) (l - - jf 1 x"- 1 e"* dx = /T Q r(a, /?). 

Proof of Lemma® Assume first that = «• For given a, /3 > define k(N) = [5 log AT] where 6 
is a fixed positive number such that S < a and [x] denotes the largest integer not greater than x. 
Write 

, v -q y r(« + fc) ( fa\ k _ r(q + fc) / ^ \ fe " r(q + fc) / g^v 

^ fc! I A^ J ^ fc! I A^ J ^ ^ fc! V ^ / 

fc=0 v 7 fc=0 v 7 fc=fc(JV)+l v 7 

= (/) + {II) 

We first show (J) — > as A" — > oo. To this end note 

(i) < n-° y: {a k t ^ N ' a e 



fc! ^ fc! 

fc=0 fc=0 
fc(JV) 

l« -t- 

fc! 



— jM — 



k=0 

Now we argue that 

(17) -»• fa a T{a, (3) as N oo. (13) 
To this end, recall the following version of the Gauss formula 

^^-1 as fc^oc. (14) 
fcifc"- 1 v ' 

In view of the above it follows that for any given e G (0, 1) and A" sufficiently large we have 

N N 

(1 - e)N~ a k a ~ 1 e~ kl3/N < (II) < (1 + e)N~ a ^ k a ~ l e^ /N . 

k=k{N)+l k=k{N)+l 

Since the expression N~ a J2k=k(N)+i fc^e"^ = N" 1 J2k=k(N)+i( k / N ) a ~ le ~ km is seen to be the 
Riemann sum for (3~ a T(a, (3) taking A" — > oo gives 

(1 - e)l3- a T(a, (3) < lim (II) < (1 + e)l3- a Y(a, /3). 

Since e may be taken arbitrarily close to zero, the relation (TT3"]) follows and yields the assertion of 
the lemma with = a. To complete the proof for an arbitrary sequence note that we need 
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in essence only to argue that N aN ~ a —>■ 1 as iV — > oo (this follows by assumption) and that the 
formula fjX4jl holds with a replaced by a k (since a k is monotone). By the continuity of gamma 
function, T{a) /T{a^) — > 1 as N — > oo. This and (fT4l) entail 

r(a fc ) a (a + 1) - • • (at + k — 1) 

— - — : ► 1 as k — > oo. 

k\ k a ~ x 

The relationship (TH1) with a replaced by a k will now follow if we can argue that 

±J :a k + s 

s=0 

as k — > oo. To this end note that 

l0 § fll ^) = E lo S <(«*-«)E^<2(«*- a) log k^ 

by our assumption on a k and thus (fT5l) follows. This, however, yields the assertion of the lemma, 
since for sufficiently large N 

N - a V ^ + fc ) fi - < AT" V F ^ + ^ fi - < AT- V ^ ± ^ (l-fay 

^ k\ \ N ~ L> k\ \ N J - k\ \ N J 

k=k{N) v 7 k=k(N) x y k=k(N) x y 

and we have just shown that the first and the last of the expressions above tend to (3~ a T(a,f5) as 
N -> oo. □ 

Having established the assertions of the lemmas above, we are finally in a position to prove the 
result given in Theorem 1. 

Proof of Theorem [TJ Part (i). Denote by X the random variable X$ and set P(X = k) = p{k) 
for k — . . . , N. Let m > be an integer. Multiplying by k m the detailed balance equation 

p(k + 1) d(k + 1) = p(k) b(k) (16) 

and then summing over k = 0, . . . , N — 1 we obtain under Mi model 

JV-l N-l 

{c 2 + c 3 N) k m {k + 1) p{k + 1) - c 3 {N - l) 1 p{N) N 2 = k m [p{N){ Cl {N -k) + c 3 Nk)}. 

k=0 k=0 

Expanding now k m = (k + 1 — l) m on the right hand side we get 

N—l m / x 

(c2 + c 3 N)J2J2[j ( k + + 1) - c 3 (iV - l) m iV 2 p(N) 

k=0 s=0 x ' 

N 

= ^[c x (iV - fc) + c 3 Nk] k m p{k) - c 3 N m+2 p(N). 

k=0 
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Denoting fi m (N) = EX m we may rewrite the above relationship as 

(c 2 + c 3 N) £ ( ) (-l) m " s /i m+ i (iV) - c 3 (N - l) m iV 2 p(JV) 

= d iV/im(^) + (c 3 N- Cl )fi m+1 {N) - c 3 N m+2 p(N). 

We set /i m = liniiv^oo jl m (N)/N m . Note that dividing both sides by N m+1 and taking iV — > oo 
gives 

(ci + c 2 )/i m +i = (mc 3 + ci)/i m - c 3 mp* 

where 

p* = lim p(N). (17) 

JV^oo 

Assuming for a moment that p* exists and is finite (this follows from (120]) below) we see that these 
considerations give the following recursive relationship for the limiting moments of 

m + ci/c 3 mp* 
AWi = 7 ; rr~ - t : w— for m = 0, 1 . . . . 

(Ci + C 2 J/C 3 (Ci + C2J/ c 3 

or in terms of a, b 

m + a m 

/Wi = — ^ — fJm ~ for m = 0, 1 . . . . (18) 

We note that in view of fio = 1 the solution to the above recursive equation is unique. Moreover, 
we note that since the support of the sequence Ojv }jv=i ^ s contained in the closed interval [0,1] 
then (i) the corresponding sequence of probability measures is tight and (ii) any of its weak limits 
must be a probability measure whose moments satisfy (JT8l) . Since the probability distributions on 
bounded intervals are uniquely determined by their moments it follows that as iV — > 00 

Y<P $ Zy (19) 

for some random variable Z\ with moments \x m given by (|T8|) . To complete the proof of part {%) 
we need to show only that Z\ is a LIG(a, b) random variable as given in Definition 13.11 Since Z\ is 
identified completely by its moments, it suffices to show that the moments of the random variable 
LIG(a,b) satisfy the recursive relation (1181) . This follows by Lemma 1 provided that 

P* = K a ,b (20) 

where 7r afe is given by (ITT]) . 

In order to argue (1201 we again consider the detailed balance equation (fTBj) 

p(N- l)b(N- l)/d{N) 



p(N)=p(N-l)b(N-l)/d(N) = — ,, v „ Y _, 

p(N - 1) b(N - l)/d{N) + Zto P( k ) 

p{N -l)b{N -l)/d{N) A N 



P(N - 1) b(N - l)/d(N) + ELo P(k) + 1 



(21) 
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where we define 

^_ p(N-l)b(N-l)/d(N) 
We note that again by (ITtjj) we get under the model M 1 the following form of A N 



c 2 +c 3 n f C3 N- Cl \ N nS^+sfe: 



C 2 N\ \c 2 + C 3 Nj v Af-l i f CaN - Cl \ k rrfc-l/ , 

2^fc = fc! ^ C2 + C3 Ary llsrrOV 6 " 1 " 



c 2 + c 3 iV ZcgiV-cA^ r(AT+^ 



c 2 iV! lc 2 + c 3 iV 



C3 N—ci - 

c 3 AT-ci') 



Z^fc=0 fc! ^c 2 +c 3 Afy 1 V*' ~ c 3 JV-ci - 



Denote 



then 



A 



A? 



= CliV 6 = ( c i + C 2) W 

C 3 A^-Ci A C 3 iV + C2 



(b-a + N) T(N + a N ) (l-b N /N) N 



Applying now the Gauss formula (114p and using the result of Lemma 2 we conclude that 

If 



lim A 



"aT n exp(b)T(a,b) 

and hence (J20l) follows by (pill which completes the proof of part (i) of the theorem. 

(2) 

Part (ii). The result follows by applying part (2) to the random variable N — X-fr . 

Part (in). For the proof of the last part of the theorem we denote now by X the random variable 
X N and otherwise retain the notation from part (i). Multiplying ( fl6l) by k m and summing as 
before, we obtain under M 3 

N N 

k m p(k + l)[c 2 (k + 1) + c 3 (k + 1)(N - k - 1)] = J2 k m p(k)[ Cl (N - k) + c 3 k(N - k)}. 

fc=0 fc=0 

The above, by an argument similar to the one used in (i), gives the relationship 

m / s m , x 

(c 2 +c 3 N)^2(-l) m ~ s ( j/Wi-c 3 ^(-l) m ~ s f J jx s+2 = CiN jx m -dfi m+ i+c 3 N Jx m+l -c 3 li m+2 . 
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Denoting fi m = liniAr^oo fi m (N)/N m , dividing both sides by N m+1 and taking iV — ► oo gives 
somewhat simpler then (j!8p recursion formula for the limiting moments, namely 

m + Cl /c 3 

fJm+i = — f ; rr~ for m = 0, 1 . . . . 

m + (ci + c 2 )/c 3 

Writing the above in terms of a, b 

m + a 

fx m+1 = — fjL m for m = 0, 1 . . . 

m + 

we obtain the familiar relationship between the moments of Beta(a, b — a) distribution. This, along 
with the tightness of measures argument similar to the one used in (i) completes the proof of part 
(Hi). □ 



4 Conclusions 

Herein we have derived a limit theorem for stationary distributions of some special birth-and- 
death processes related to the Langmuir dynamic adsorption-desorption model. Such a model is 
of interest in the context of the microarray hybridization reactions if one may assume that the 
fluorescence signal on the array is approximately a realization of a chemical Langmuir equilibrium 
of the adsorption and desorption reactions between the target mRNA molecules and the DNA 
probes. Whereas this assumption may be questionable for long (hundred basis or more) probes, it 
seems reasonable for the short ones, like e.g., the 25-mers used on many Affymetrix chips. Indeed, in 
the context of Affymetrix GeneChip arrays, the gamma-type approximation to the gene expression 
data based on an ad-hoc Langmuir-like equilibria argument have been proposed in the literature 
as a way of enhancing the data analysis. Our current result gives a rigorous justification of the use 
of truncated-gamma and beta-type distributions in order to approximate the fluorescence readings 
of the probe- RNA complexes obtained in course of an Affymetrix microarray experiment. It also 
explains some experimentally observed behavior of these readings like e.g. the signal saturation and 
the signal thresholding phenomena. 

The potential usefulness of our approximation results stems also from the fact that they allow one 
to describe the theoretical means of measured fluorescence intensity readings by three parameter 
hyperbolic response functions which can be obtained as solutions of the corresponding deterministic 
Langmuir equations. In general, these response functions for specific probes shall be only sequence 
dependent and could be therefore used universally in all experiments involving a particular probe 
sequence. Our results imply also that the fold changes in RNA-target concentration are not linearly 
related to fold changes in fluorescence intensity readings, as is often generally assumed. 

As pointed out by some authors (cf. Wu and Irizarry [l6j]) the formidable challenge in microarray 
experiments is to establish a reliable algorithm for extracting the true RNA concentrations mea- 
surements from the probes fluorescence intensity readings. We believe that the results of this paper 
could perhaps get us a step closer to that goal. 
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